STAT 211: Normal Distribution

Computing probabilities and quantiles from this important distribution

Darren Homrighausen

The Normal Distribution

Defining the normal distribution

The normal (also known as ‘Gaussian’) is \(X\sim N(\mu,\sigma^2)\):

  • \(\mu\) is the expected value (\(E[X]\))
  • \(\sigma^2\) is the variance of (\(Var[X]\))

Its pdf:

\[ f(x)=\frac1{(2\pi\sigma^2)^{1/2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]

(Just like the binomial, you don’t need to know this formula for this class)

Computing probabilities: old vs. new

Computing probabilities with the normal requires computers

In times of yore, this meant using tables:

Standard Normal Table for getting probabilities from Z-scores

Now, we can just use a computer:

pnorm(1.8,0,1)
[1] 0.9640697
pnorm(1.89,0,1)
[1] 0.970621

Computing Probabilities and Values

Computing probabilities

We will always write \(X\sim N(\mu,\sigma^2)\) for the normal

R syntax will be specified with \(\mu\) and \(\sigma = \sqrt{\sigma^2}\) instead

Example

If \(X \sim N(5,9)\), then the probability \(X\) is less than 4 is:

x       = 4
mu      = 5
sigmaSq = 9
pnorm(x, mu, sqrt(sigmaSq))
[1] 0.3694413

Computing values (quantiles)

Continuing with \(X \sim N(5,9)\):

(that is, normal with expected value 5 and variance 9)

Recall:

For a value \(x = 4\), \(P(X \leq 4)\):

pnorm(4,5,3)
[1] 0.3694413

Now, let’s ask the opposite question!

For a fixed probability \(p\), what is the value \(x\) so that \(P(X \leq x) = p\)?

This value is called a quantile

Example:

What \(x\) makes \(P(X\leq x) = .7\)?

qnorm(.7,5,3)
[1] 6.573202

Using pnorm and qnorm

Concept Check: What would each of the following codes produce?

qnorm( pnorm(3,5,3), 5, 3)
[1] 3
pnorm( qnorm(.2,5,3), 5, 3)
[1] 0.2

Computing probabilities and values

Suppose \(X \sim N(-3.2, 100)\)

that is, normal with expected value -3.2 and variance 100

Probability: What is the probability \(X\) equals 0?

This is always zero!

Probability: What is the probability \(X\) is less than 0?

pnorm(0,-3.2,sqrt(100))
[1] 0.6255158

Values: at what value \(x\) does \(P(X\leq x) = .05\)?

qnorm(.05,-3.2,sqrt(100))
[1] -19.64854

Computing probabilities: SATs

Example

The scores on the SAT math section are \(X \sim N(520,3600)\)

(that is, normal with expected value 520 and variance 3600)

What is the probability someone scores less than 600?

\(P(X \leq 600) = \text{what probability?}\)

x       = 600
mu      = 520
sigmaSq = 3600
pnorm(x, mu, sqrt(sigmaSq))
[1] 0.9087888

Computing values: SATs

Example

The scores on the SAT math section are \(X \sim N(520,3600)\)

The stats dept. admits students scoring above \(96^{th}\) percentile

What is the cutoff score (\(x\)) for recruitment by the stats dept.?

\(P(X \leq x) = 0.96\)

prob    = .96
mu      = 520
sigmaSq = 3600
qnorm(prob, mu, sqrt(sigmaSq))
[1] 625.0412

Transformations

Transforming Normals: \(aX + b\)

Suppose again \(X \sim N(\mu, \sigma^2)\)

If we multiply/add constants \(a,b\) to \(X\) to form a new rv \(Y\):

\[ Y = aX + b \]

then \(Y \sim N(a\mu + b, a^2 \sigma^2)\)

(This should remind you of our \(E[X]\) and \(Var[X]\) results)

Example:

If \(X \sim N(150,100)\) then

\[ Y = 3X + 30 \sim N(480, 900) \]

Transforming Normals: inches to cm

Measuring to a shoulder on a machined part with a steel rule

We are measuring a machined part using a ruler on a .01” scale

If several people measure, they will get some random amount of measurement error:

\[ X \sim N(.875, .00005) \]

What is the probability that that a measurement will be within \(\pm\).01 centimeters?

(An inch is 2.54 cm)

\(Y = 2.54X\), therefore \(Y \sim N(2.54*.875, 2.54^2*.00005) = N(2.2225, 0.00032)\)

\[\begin{align} P(2.2225 - .01 \leq Y \leq 2.2225 + .01) & = \\ = P(2.2125 \leq Y \leq 2.2325) & \\ = F(2.2325) - F(2.2125) \end{align}\]

mu      = 2.54*.875
sigmaSq = 2.54^2*.00005
pnorm(mu + .01, mu, sqrt(sigmaSq)) - 
    pnorm(mu - .01, mu, sqrt(sigmaSq))
[1] 0.4223202

The Standard Normal Distribution

When \(\mu=0\) and \(\sigma=1\), we call it the standard normal

\[ f(x)=\frac1{\sqrt{2\pi}}e^{-x^2/2},\,\,-\infty<x<\infty. \]

(Once again, you don’t need to know this formula for this class)

The standard normal will be notated as: \(Z \sim N(0,1)\)

Standardization

Going from a general normal to a standard normal is known as standardization

\[ X \sim N(\mu, \sigma^2) \longrightarrow Z = \frac{X - \mu}{\sigma} \sim N(0,1) \]

The opposite (actually, the inverse) is true

\[ Z \sim N(0, 1) \longrightarrow X = Z \sigma + \mu \sim N(\mu,\sigma^2) \]

(this is using the results in Transforming Normals)

Z-scores: How to compare values

Z-scores: Comparing values

The z-score can be computed:

\[ \text{z-score} = \frac{x - \mu}{\sigma} \]

(For a value \(x\) from a distribution with mean \(\mu\) and variance \(\sigma^2\))

Question: Which of the two values from two different distributions is more unusual?

Answer: Whichever z-score has larger magnitude

(that is, largest |z-score|)

Example: We want to compare olympic records in men’s and women’s sprinting. Which is more unusual?

Let’s look at a table:

Table of Average and Standard Deviation of Olympic Sprinting times.
Category Mens Womens
average 9.85 10.83
variance .0057 .0049
record 9.63 10.61

Let’s compute the two z-scores:

zScoreM = (9.63 - 9.85)/sqrt(.0057)
zScoreM
[1] -2.913971
zScoreW = (10.61 - 10.83)/sqrt(.0049)
zScoreW
[1] -3.142857

The women’s result is more unusual

The z-score for men is -2.914 while for women it is -3.143

Z-scores: Another application

Another use of z-scores: what value \(x\) would be equally as unusual?

Answer: Choose the \(x\) so that the z-scores are equal

Example: What time for a men’s sprinter would be equivalent to the female record?

Table of Average and Standard Deviation of Olympic Sprinting times.
Category Mens Womens
average 9.85 10.83
variance .0057 .0049
record 9.63 10.61

We found the female z-score = \(\frac{10.61 - 10.83}{\sqrt{.0049}}\) = -3.143

To find the equivalent time, we need to unstandardize \[ \text{z-score}*\sigma + \mu = x \]

Here, use

  • z-score from the female sprinters
  • the \(\mu\) and \(\sigma\) from the male sprinters (\(\mu = 9.85\), \(\sigma^2 = .0057\))
x = zScoreW * sqrt(.0057) + 9.85
x
[1] 9.612719

The male record would need to be 9.613 seconds to be as unusual as the female record.

Empirical rule

Values from normal distributions contain predictable amounts of probability

This is the empirical rule:

  • \(P( |Z| \leq 1) = 0.683\) (\(\sim \frac{2}{3}\) obs. within 1 \(\sigma\))
  • \(P( |Z| \leq 2) = 0.954\) (\(\sim \frac{19}{20}\) obs. within 2 \(\sigma\))
  • \(P( |Z| \leq 3) = 0.997\) (\(\sim \frac{99}{100}\) obs. within 3 \(\sigma\))

\(P( |Z| \leq 1) = 0.68\)

\(P( |Z| \leq 2) = 0.95\)

\(P( |Z| \leq 3) = 0.997\)

A (potentially) helpful applet

On exams, there will be a link to an applet: Probability applet

Note: This applet is totally optional and unnecessary if you use R

Standard Normal Table for getting probabilities from Z-scores

Returning to SAT example:

x       = 600
mu      = 520
sigmaSq = 3600
pnorm(x, mu, sqrt(sigmaSq))
[1] 0.9087888
qnorm(.96, mu, sqrt(sigmaSq))
[1] 625.0412

To use the applet

  • Probability: we need the z-score = 1.3333333
  • Value: We need to get standardized z-score from applet and then unstandardize to get value.
1.751*sqrt(sigmaSq) + mu
[1] 625.06

(here, 1.751 came from the applet by using the probability .96 from SAT example)

We’ll be back…

The normal plays a central role in probability and statistics

We will return to it later during sampling distributions